Bayesian sampling of integration dates in the
latent HIV reservoir
Roux-Cil Ferreira, Emmanuel Wong and Art
Poon
Department of Pathology and Laboratory Medicine,
Western University, Canada
Dating HIV integration events
-
Replication-competent provirus from the latent reservoir can be
sequenced from viral outgrowth experiments.
-
One approach is to build a phylogeny and map outgrowth sequences to
dated plasma HIV RNA.
-
Requires extensive sampling of HIV-1 plasma RNA pre-ART - uncertainty in
the tree and location of root.
|
Image source: Abrahams et al. (2019) The
replication-competent HIV-1 latent reservoir is primarily established
near the time of therapy initiation. Science Trans Med 11(513).
|
Root-to-tip regression
-
Under a strict molecular clock, the expected number of mutations
increases linearly with time.
-
A root-to-tip
(RTT) method regresses divergence from the root (\(y\)-axis) on sample collection dates (\(x\)-axis).
-
Fit regression to HIV RNA only, use model to predict integration dates.
-
May be more robust to uncertainty in tree (dating is a function of
lineage-specific divergence).
|
Image
source: Buonagurio DA, et al.. Evolution of human influenza A
viruses over 50 years: rapid, uniform rate of change in NS gene.
Science. 1986 May 23;232(4753):980-2.
|
Problems with RTT dating
-
Ignores variation in number of mutations — there is one, and only one,
estimated date.
-
Unclear how to deal with divergent outgrowth sequences.
-
Proviral sequences can be mapped to post-ART period or the future
(right, red zone).
-
Assumes location of root and clock rate are known without error.
|
|
Poisson likelihood
- A strict clock assumes that mutations accumulate at a constant rate
\(\mu\) over time, so variation is
equal to the mean (Poisson).
- Let \(Y_i\) be the number of
mutations in the \(i^{\textrm{th}}\)
sequence, given location of root.
- Let \(\Delta t_i\) be the time
elapsed between the \(i^{\textrm{th}}\)
sample and the root.
- The log-likelihood for RNA set \(\{Y_i,
\Delta t_i\}\) is:
\[\log L(Y_i, \Delta t_i) = \sum_i
Y_i\log(\mu \Delta t_i) - \mu \Delta t_i - \log
\Gamma(Y_i+1)\]
- This is the Langley-Fitch (1974) model.
Metropolis-Hastings sampling
| Root |
Uniform |
\(\textrm{Unif}(-\delta,
+\delta)\), reflection on tips and random choice of branches at
splits. |
| Clock rate |
Lognormal |
\(\textrm{Unif}(-\delta, +\delta)\)
proposal reflecting on zero. |
| Origin date |
Uniform |
Truncated normal proposal with mean 0 and variance \(\sigma\). |
Sampling integration dates
Using Bayes’ rule, the probability of integration time \(t_i\) for outgrowth sequence with
divergence \(Y_i\) is: \[P(t_i|Y_i) = \frac{P(Y_i|t_i)
P(t_i)}{P(Y_i)}\]
We assume \(P(t_i) =
\frac{1}{T-t_0}\), where \(t_0\)
is the origin date and \(T\) is the
maximum integration date, and letting \(s=t-t_0\):
\[
P(Y_i) = \frac{\int_0^{T-t_0} (\mu s)^{Y_i} \exp(-\mu s)
\mathrm{d}s}{(T-t_0)\Gamma(Y_i+1)}
= \frac{\gamma(Y_i+1, \mu(T-t_0))}{\mu(T-t_0)\Gamma(Y_i+1)}
\]
where \(\gamma(s, x)\) is the lower
incomplete gamma function.
Sampling integration dates (2)
Letting \(\Lambda=\mu(T-t_0)\),
the probability of \(t_i\) given \(y_i\) mutations simplifies to: \[
P(t_i | Y_i) = \frac{\mu \Lambda^{y_i}\exp(-\Lambda)}{\gamma(Y_i+1,
\Lambda)}
\]
Since we can’t solve for the inverse CDF, we use a simple
rejection method to sample integration times.
\(t_0\), \(y_i\) and \(\mu\) are sampled from the posterior
distribution.
Simulation model
- We used an exact stochastic simulation module in R
(
twt) to simulate cell population dynamics (forward
time).
- Discrete events with exponential waiting times determine population
dynamics over time.

Simulating population dynamics
-
Branching events require a “source” cell to induce a “target” cell to
change state.
-
Transmission of virus from an infected cell to a susceptible cell.
-
Virus replication completely blocked by ART initiation.
-
Transition events occur when a cell switches from one state
(compartment) to another.
-
e.g., reactivation of a latently-infected cell.
|
|
Simulation scenarios
- Early sampling:
- Sampled 10 HIV-1 RNA lineages at times \(t=3\), 6 and 9 post-infection.
- Initiated ART at time \(t=10\).
- Sampled 10 HIV-1 DNA lineages at time \(t=20\).
- Later sampling:
- Sampled 10 HIV-1 RNA lineages at times \(t=11\), 13 and 15 post-infection.
- Initated ART at time \(t=15\).
- Sampled 10 HIV-1 DNA lineages at time \(t=20\).
Simulating trees
-
From the population size trajectories,
twt samples trees in
reverse time for given sampling times and cell types.
-
▲ = branching event (infection from active cell, or clonal expansion of
latent cell)
-
○ = state transition (from active to latent or vice versa)
-
Retaining these tree annotations lets us collapse branches associated
with latently-infected cells.
|
|
Simulating evolution
- Ran trees from
twt through INDELible
- TN93 nucleotide substitution model with 40%
As.
- Seeded with an HIV-1 subtype C sequence at the root (AY772699)
- Reconstructed trees from simulated alignments with FastTree2 (double
precision build).
- Input trees to
bayroot, ran chain samples for \(2\times 10^4\) steps
- Discarded burn-in of 2,000 steps
- Thinned remaining chain down to 1,000 samples.
Comparing bayroot to root-to-tip regression
- For comparison, ran RTT on same input trees by censoring sampling
times associated with DNA tips
- Rooted tree using
rtt function in R package
ape
- Extracted root-to-tip distances from resulting tree
- Fit simple linear regression of these distances against sampling
times
- Calculated root mean square error between estimated (\(\hat{t}\)) and actual (\(t\)) integration dates:
\(\textrm{RMSE}=\sqrt{\left. \sum_{i=1}^n
(\hat{t}_i - t_i)^2 \right/ n}\)
Early sampling
When clock signal is strong, advantage of bayroot is
largely due to prior information about ART initiation (dashed line).
Integration date estimates for one replicate simulation.
|
Paired
Wilcoxon sign-rank test, \(P=3.55\times
10^{-4}\), \(n=50\).
|
Later sampling
-
Advantage of
bayroot driven by prior information about the
root.
-
With a less informative prior,
bayroot results become
similar to RTT.
|
Paired Wilcoxon sign-rank test, \(P=3.82\times
10^{-7}\), \(n=50\).
|
Application to real data
ZM1044M, Zambia-Emory HIV Research Project
Brooks et al., 2020, PLOS Pathog 16(6): e1008378.
Limitations
- We are assuming the input unrooted phylogeny is known without error.
- This assumption could be relaxed by hierarchical sampling (input a
posterior sample of trees)
- We assume the divergence of each sequence is an independent outcome.
- Sequences share common ancestors, will have a similar root-to-tip
distance because they inherit the same mutations.
- This could be addressed by adapting the covariance matrix of the
regression to the phylogenetic structure of the data.
- A simple and efficient means of incorporating prior knowledge and
managing uncertainty in dating the HIV reservoir.
- R package available from
https://github.com/PoonLab/bayroot
This study was supported by funding from CIHR and NIH (REACH
AI164565-01). RC was supported by an Ontario Genomics-CANSSI Fellowship
in Genome Data Science.